Creating Reports in Excel

In this notebook we'll use two methods to create Excel reports from a MongoDB query:

  1. The to_excel function of Pandas
  2. The xlsxwriter Python library

The first method is a very easy way to produce an Excel file, while the second allows for a greater amount of formatting.

Method 1

Method 1 uses the to_excel function of Pandas.

Use the mongoimport utility in the bin folder to import the data into MongoDB. Be sure to change the path to your data file so this command works.

./bin/mongoimport --db wsdm --collection uk_accidents --type csv --headerline --file './data/Stats19-Data1979-2004/Accidents7904.csv' --numInsertionWorkers 5

In [ ]:
# Import the Python libraries we need
from pymongo import MongoClient
import pandas as pd
from time import strftime

In [ ]:
# Create a connection to MongoDB
client = MongoClient('localhost', 27017)
db = client.wsdm
collection = db.uk_accidents

In [ ]:
# Get the first 1000 records where the accident happened on a Friday
data = collection.find({"Day_of_Week": 6}).limit(1000)

In [ ]:
# Create a new DataFrame from the MongoDB query
df = pd.DataFrame(list(data))
# Show the first 5 rows
df.head()

In [ ]:
# Delete the _id column as we don't need it and we can't write to the Excel file with it
# This is the BSON Object ID from MongoDB
df = df.drop(['_id'], axis=1)
df.head()

In [ ]:
# Create a variable to hold the path to our file
base_path = '../reports/'
report_file_name = strftime("%Y-%m-%d") + " Accidents Report.xlsx"
report_file = base_path + report_file_name

In [ ]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(report_file, engine='xlsxwriter')

# Use the to_excel function to write the file
df.to_excel(writer,
            sheet_name='Accidents',
            header=True,
            index=False,
            na_rep='')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Method 2

Method 2 uses the Python xlsxwriter library


In [ ]:
# Import the Python libraries we need
from pymongo import MongoClient
from time import strftime
import xlsxwriter

In [ ]:
# Create a connection to MongoDB
client = MongoClient('localhost', 27017)
db = client.wsdm
collection = db.uk_accidents

In [ ]:
# Create a variable to hold the path to our file
base_path = "../reports/"
report_file_name = strftime("%Y-%m-%d") + " Accidents Report Method 2.xlsx"
report_file = base_path + report_file_name

In [ ]:
# Set up our Excel workbook
accident_report = xlsxwriter.Workbook(report_file, {'constant_memory': True,
                                                    'default_date_format': 'mm/dd/yy'})

In [ ]:
# Add some formats to the Excel file
xl_header_format = accident_report.add_format()
xl_header_format.set_bold()

xl_missing_format = accident_report.add_format()
xl_missing_format.set_bg_color('red')

In [ ]:
# Iterators we'll need to loop through the data
e_row = 0
e_col = 0

In [ ]:
# Create the worksheet
worksheet = accident_report.add_worksheet('Accidents')

In [ ]:
# Get the keys from the collection to use as the header of the file
headers = []
# Get a single record from the collection
doc = collection.find_one()
# Iterate through the doc and add the keys to the array
for key in doc:
    headers.append(key)
# Delete the _id column
headers.remove('_id')

In [ ]:
# Add the sheet header
i = 0
for header in headers:
    worksheet.write(e_row, e_col + i, headers[i], xl_header_format)
    i += 1

# Add one so when we start adding the data we start at the next row in the spreadsheet 
e_row += 1

In [ ]:
# Get the first 1000 records where the accident happened on a Friday
data = collection.find({"Day_of_Week": 6}).limit(1000)

In [ ]:
for doc in data:
    e_col = 0
    for value in headers:
        worksheet.write(e_row, e_col, doc[value])
        e_col += 1
    e_row += 1

In [ ]:
# Close the file
accident_report.close()

In [ ]: